-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add GC content parameter --mingc --maxgc #30
Conversation
Yeah, the performance is an issue. I will make it an optional parameter then. |
I updated the code to check if |
I have made some changes to how you implemented this. Would you agree that this is equivalent to your code? |
Yeah, I will run some benchmark using huge |
Hi @wdecoster I ran some tests on a 44G FASTQ file from Drosophila melanogaster ONT sequencing and found some interesting results.
I found that the GC filter does not significantly affect the run time. The |
Oh yes indeed, that is interesting. I agree that the GC calculation doesn't slow things down when it is not needed, so that is great. Out of curiosity, did you also benchmark it when you did specify a min or max gc? It seems gunzip does a far better job than the decompression done by Rust. I wonder if it is because you are then running the decompression in another process, piping the output, but I don't know how things get implemented in flate2. But things like this are also precisely the reason why I initially chopper only read from stdin. Unix tools are intended to do one thing well, and just one thing, and piping tools into the next tool is a great approach. I will add a note regarding your findings to the README. |
add note on speed of decompression while piping
Yeah I added the test for
Maybe I will deal with this problem the |
Hi @wdecoster ,
Recently, I added parameters for GC content filter
--mingc
and--maxgc
for my own project usage. I also added atestGC.fastq
for testing under/chopper/test-data
.It is very useful to deal with high-GC bacteria ONT sequencing.
Hope u like my modifications.